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ABSTRACT 

There are 512 two-locus, two-allele, two-phenotype, fuUy-penetrant disease models. 
Using the permutation between two alleles, between two loci, and between being affected 
and unaffected, one model can be considered to be equivalent to another model under 
the corresponding permutation. These permutations greatly reduce the number of two- 
locus models in the analysis of complex diseases. This paper determines the number of 
non-redundant two-locus models (which can be 102, 100, 96, 51, 50, or 48, depending on 
which permutations are used, and depending on whether zero-locus and single-locus mod- 
els are excluded). Whenever possible, these non-redundant two- locus models are classified 
by their property. Besides the familiar features of multiplicative models (logical AND), 
heterogeneity models (logical OR), and threshold models, new classifications are added or 
expanded: modifying-cffcct models, logical XOR models, interference and negative inter- 
ference models (neither dominant nor recessive), conditionally dominant /recessive models, 
missing lethal genotype models, and highly symmetric models. The following aspects of 
two-locus models arc studied: the marginal penetrance tables at both loci, the expected 
joint identity-by-dcsccnt probabilities, and the correlation between marginal identity-by- 
descent probabilities at the two loci. These studies are useful for linkage analyses using 
single-locus models while the underlying disease model is two-locus, and for correlation 
analyses using the linkage signals at different locations obtained by a single-locus model. 
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1 Introduction 



Disease models involving two genes, usually called "two- locus models" (e.g. |6^), 
have been widely used in the study of complex diseases, including likelihood-based linkage 
analysis ||3^, 0, ^ , allele-sharing-based linkage analysis ||39|, [l^, |7^, ^ ^ , marker- 
association-segregation method 



14[| , weighted-pairwise correlation method variance 



component analysis ||84 
analysis 



, 86|, recurrence risk of relatives [S8, 74, 37|, and segregation 



[181 , p!9i p!q| . Besides human genetics, two-locus models have also been 



used in the study of evolution, as well as genetic studies of inbreeding animals and plants. 

Using two-locus models is a natural choice if the underlying disease mechanism indeed 
involves two or more genes, though there have been extensive discussions on the power 



of using single-locus models for linkage analysis in that situation [^6|, |3^, ^ |8^, ^ |90 



69, 15, 40 



J. Also, two-locus models have frequently been used in generating 
simulated datasets for testing various linkage methods and strategies [27, 11, 37, 12, 
|37| , Although segregation analysis based on two- locus models is common 



82, 28 



70, 93, E2^, u6, 43[, linkage analysis based on two-locus models is relatively rare, due to 



the large number of combinations of two markers out of as many as 300 markers in the 
whole genome, due to the cost of a time-consuming calculation of the pedigree likelihood, 
and due to a large number of possible possible interactions between two genes. 

One would naturally ask: how many possible types of two-locus models exist? Complete 
enumerations and classifications of systems have been used in many other fields as a 
starting point of a study; for example, two-person two-move games in the study of game 



theory []73[, two-state three- input cellular automata in the study of dynamical systems 
55[, and two-symbol 3-by-3 lattice models in the study of protein folding ^3\. These 



types of studies lay out the space of all possibilities, with nothing missing. This paper 
follows a similar path in completely enumerating all two-locus two-allele two-phenotype 
disease models. 



Strickberger [|83| listed a few a types of two-locus models encountered in experimental 
systems, though the number of phenotypes is multiple (such as being a smooth, partly 
rough and fully rough Mendelian pea), instead of binary (such as affected and unaffected). 
Defrise-Gussenhoven [§ listed five types of two-locus models, which were followed up by 
a study by Greenberger [^. Neuman and Rice listed six two-locus models [0. Never- 
theless, nobody provided a complete list of all possible two-locus models. 

This complete enumeration of all two-locus models can be useful when a linkage signal 
is observed in two separated regions, or if two candidate genes with known locations are 
studied. In these situations, it is of interest to determine the nature of the interaction be- 
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tween the two disease genes (e.g. Jl^)- Without knowing all possible forms of interaction, 
such determination is not complete. 

A list of all two-locus models is perhaps useful for likelihood-based linkage analysis, 
but may not be essential. In such a linkage analysis, parameters in the two-locus model 
can be determined by a maximum likelihood method, and the fitted values are generally 
continuous rather than discrete. The enumeration of two-locus models in this paper, 
however, uses discrete parameter values. Nevertheless, during the stage of interpretation 
of the result, the classification of two-locus models discussed in section 3 can be useful. 

Since most likelihood-based linkage analyses still use single-locus disease models, it is 
of interest to know how closely a single-locus model approximates a two-locus model. For 
this purpose, we examine the marginal penetrance (on both loci) of all two-locus models, 
which should be the optimal parameter value if a single-locus model is used for the linkage 



analysis [7S]. The question of which two- locus models can be reasonably approximated by 
single-locus models, or which two-locus interaction can be detected by single-locus linkage 
analysis, can be easily answered by this marginal penetrance information. This topic will 
be discussed in section 4. 

AUele-sharing-based linkage analysis requires a calculation of the expected allele sharing 
between a relative pair under a certain disease model 0, |7^, ^, We provide a 
new formulation for this calculation which is an extension of the classical Li-Sacks method 



52, 51 1, which in turn is based on the Bayes' theorem. This topic will be discussed in 



section 5. 

It has been suggested that interaction or epistasis between two regions can be detected 
by calculating the correlation between two linkage signals, each determined by a single- 
locus linkage analysis |10[- A positive correlation may suggest interaction (epistasis), 
and a negative correlation may suggest heterogeneity [^0|, |10|. We examine such a correla- 
tion for all two-locus models, which not only confirms this simple rule-of-thumb, but also 
generalizes to other two- locus models. This topic will be discussed in section 6. 



2 Enumeration of two-locus models 

A two-locus model is typically represented by a 3-by-3 penetrance table. The row label 
gives the three possible genotypes of the first disease locus (i.e. aa,aA,AA, where A might 
be considered as the disease allele at locus 1), and the column label gives the genotypes 
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for the second locus (i.e. bb,bB,BB, where B is the disease allele at locus 2): 



{/.}= : r ' f (1) 





bb 


bB 


BB 


aa 


hi 


/l2 


fl3 


aA 


hi 


/22 


hs 


AA 


hi 


/32 





The table element hj ( "penetrance" ) is the probability of being affected with the disease 
when the genotype at the first locus is i, and that of the second locus is j. In the most 
general case, /j^'s range from to 1. Models defined on continuously varying parameters 
are hard to be classified to a few discrete categories. On the other hand, if the the allowed 
values of fi/s are and 1 only ("fully penetrant"), we can categorize the nine-parameter 
space to 2^ = 512 distinct points. We use the following notation to label each of these 512 
fully-penetrant two-locus models: 

"model number" 10 = (/ii/i2/i3/2i/22/23/3i/32/33)2 (2) 

where the subscript of 2 or 10 indicates whether the number is represented as binary 
or decimal. For example, if a model has /13 = 1 and other /j^-'s are zero, the binary 
representation of the penetrance table is (001000000)2, which is 64 in decimal notation, or 
model M64. Model numbers range from to 511. 

The number of non- redundant two-locus models is less than 512 due to the following 
considerations: (i) if all /j/s are (or 1), the model is a zero-locus model; (ii) if the 
elements of the penetrance table do not change with row (or with column), it is a single- 
locus model; the nature of the model should not change (iii) if the first and second locus are 
exchanged; (iv) if the two alleles in the first (or second) locus are exchanged; or (v) if the 
affection status is exchanged. We will show below that when the symmetries implied by 
permutation (iii) and (iv) are imposed, the number of non-redundant two-locus model (A^^i) 
is 102; when (iii),(iv),(v) are considered, the number (A^2) is 51. Subtracting zero-locus 
and/or single-locus models, we get A?"!— 2=100, A?"!— 6=96, N2—l=50, and A^2— 3=48. 

This result of the number of non-redundant two-locus models is based on the counting 
theorem by Polya and de Bruijn [^, |l^. Cotterman pioneered combinatorial genetics, but 



he only enumerated single-locus multiple-allele models P]. Although Hartle and Maruyama 
had already applied the counting theorem to enumerate genetic models [^], we would like 



to repeat and simplify the derivation to focus on our particular case, i.e., the two- locus 
two-allele models. 

To do so, it is necessary to review the concept of "cycle index" below. If a permutation 
is applied to a set of m elements, some elements are invariant under this permutation (61 
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of them), some form cycles of length 2 (62 of them), some form cycles of length 3 (63 of 
them), etc. For each permutation, construct a polynomial with m variables: 

rrbi^b2^b3 bm 
Xi X2 J'S J'm • 

Going through all permutation p's that are part of the permutation group P (suppose the 
number of permutations is |-P|), the cycle index is defined as the polynomial: 

CI ^„ . . . ^ — ^ \ ^ rJ>lrJ>2rJ>3 . . . rJ>m 

For two-locus models, there are 9 genotypes, and eight permutations can be considered 
on this set of genotypes: (i) the identity operation; (ii) exchange alleles a and A] (iii) 
exchange alleles h and B] (iv) exchange the first and the second locus; (v) is (ii) plus (iii); 
(vi) is (ii) plus (iv); (vii) is (iii) plus (iv); (viii) is (v) plus (iv). The cycle index for this 
group of eight permutations on the 9 genotypes is: 



2^2) ■ ■ ■ ^^gj 



By Polya's counting theorem (theorem 5.1 in |T^) the number of non-redundant two 



locus models, without considering permutations in phenotype, is equal to the cycle index 
of the permutation group on the genotype evaluated by replacing all variables by the 
number of phenotypes (which is 2), i.e.: 

29 + 28 + 2^ + 2^ 
A^i = —— -— = 102. 

8 

When all O's in the penetrance table are switched to 1 and I's switched to 0, one 
two-locus model becomes another two-locus model. If we consider these two models as 
equivalent, the number of non-redundant models is 

iV, = ^ = 51. 

2 

Actually, the same conclusion can be obtained by considering not only the cycle index 
of the permutation group on the genotype, but also that of a permutation group on the 
phenotype, then using de Bruijn's generalization of Polya's theorem (see Appendix 1). 
The advantage of this approach is that if a more complicated permutation group applied 
to phenotype is considered, the method to get A^2 by a simple division of A''i would not 
work. 
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3 Classifying two-locus models 

This section discusses some possible classification schemes of two-locus models. No at- 
tempt is made to exhaustively classify all models, considering the fact that some "exotic" 
models can never be classified using familiar terms. What we have here is a collection of 
classification schemes, each selecting a subset of models by a special property they possess. 
As a comparison, out of the 50 models listed in this paper, Defrise-Gussenhoven studied 
Ml, M3, Mil, M15, M27 [g; Greenberg studied Ml, M3, M27 0; and Neuman and Rice 



studied Ml, M3, Mil, M15, M27,M78 [0]. All N2 - 1 =50 models are listed in Table 
1. The Ni — N2 — 1 =50 models generated by switching affecteds and unaffecteds (plus 
possibly other permutations between loci and allele) are listed in Table 2 for convenience. 
We first review the 6 models studied in 



Jointly-recessive-recessive model (RR) 

Ml requires two copies of the disease alleles from both loci to be affected. This 



model was studied as early as 1952 |^ and can also be called "recessive 

complementary" . 

Jointly-dominant-dominant model (DD) 

M27 requires at least one copy of the disease allele from both loci to be affected. This 
model can also be called "dominant complementary". 

Jointly-recessive-dominant model (RD) 

M3 requires two copies of disease alleles from the first locus and at least one disease 
allele from the second locus to be affected. 



Note that the Heterogeneity models (logical OR models) discussed in ||6^ are 
equivalent to the above three RR, DD, RD models by the 1 permutation in 
the penetrance table plus possibly some permutations between two loci and/or two 
alleles. RR model becomes D-fD model, DD model becomes R-l-R, and RD becomes 

D+R [|g . 



4. A modifying-efFect model (Mod) 

M15 can be modified to a single-locus recessive model if the penetrance at the geno- 
type aA-BB is changed from 1 to 0. This model is one of the "modifying-effect 
models" and "almost single-locus models" discussed below. 

5. Threshold model (T) 
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Ml(RR) 



M2 



M3(RD) 



M5 



M7(1L:R) 



MIO 



Mil (T) 





1 



M12 





1 

1 



M19 




1 
1 1 



M30 





1 1 

1 1 



M57 




111 
1 



M78(XOR) 



1 

1 

1 1 



M99 



1 

1 
1 1 


















1 


M13 











1 


1 


1 


M21 











1 


1 


1 


M40 








1 


1 








M58 








1 


1 1 





1 


M84 





1 





1 


1 





MlOl 





1 


1 





1 


1 



M186 



















1 1 


M14 











1 


1 


1 


M23 











1 


1 


1 1 


M41 








1 


1 





1 


M59 








1 


1 1 





1 1 


M85 





1 





1 


1 


1 


M106 





1 


1 


1 





1 





















1 





1 



M15(Mod) 












1 


1 


1 1 


M26 











1 1 





1 


M42 








1 


1 





1 


M61 








1 


1 1 


1 


1 


M86 





1 





1 


1 


1 


M108 





1 


1 


1 


1 


















1 


1 1 


M16 











1 









M27 (DD) 












1 1 





1 1 


M43 








1 


1 





1 1 


M68 





1 








1 





M94 





1 





1 1 


1 


1 


M113 





1 


1 


1 





1 












1 





1 


M17 











1 





1 


M28 











1 1 


1 





M45 








1 


1 


1 


1 


M69 





1 








1 


1 


M97 





1 


1 








1 


M114 





1 


1 


1 





1 












1 





1 1 


M18 











1 





1 


M29 











1 1 


1 


1 


M56(1L:I) 








1 


1 1 








M70 





1 








1 


1 


M98 





1 


1 








1 


M170 





1 


1 


1 





1 



Table 1: The penetrance tables of all N2 — I =50 two-locus models. Each model represents a group 
of equivalent models under permutations. The representative model is the one with the smallest model 
number. The six models studied in Neuman and Rice ("RR,RD,DD,T,Mod,XOR") as well as two 
single- locus models ("IL") - the recessive (R) and the interference (I) model, are marked. 
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M:-!l — 15 (Mod) AliT— 2:i AKi.'i— 7(1L:D) A171— 59 M79^27(R+R) M87^46 M95— 11(T) 






































1 







1 










1 










1 





1 1 






1 





1 




1 


1 1 


















1 







1 










1 


1 


1 


1 1 






1 


1 


1 




1 


1 1 




1 


1 


1 




1 


1 1 




1 


1 


1 




1 


1 


1 




M102- 


♦94 




M103 


-^30 




M105^61 




M107-* 


29 




M109^ 


57 




MllO- 


-^86 




Mlll- 


-^19 





1 












1 







1 










1 







1 










1 










1 


1 









1 










1 


1 




1 





1 




1 


1 




1 





1 




1 





1 


1 


1 






1 


1 


1 







1 







1 


1 




1 


1 




1 


1 







1 


1 


1 




M115- 


*99 


M117- 


-^106 




M118^78 




M119^ 


14 




M121^45 


M122- 


■♦101 




M123- 


-+13 





1 












1 







1 










1 







1 










1 










1 


1 


1 






1 


1 







1 


1 




1 


1 







1 


1 1 




1 


1 


1 




1 


1 


1 





1 1 






1 





1 




1 


1 




1 


1 


1 







1 







1 










1 


1 




M124^ 


108 




M125- 


-+41 




M126^70 


M127- 


^3(D+R) 




M171^ 


S5 


M173- 


^113 




M175- 


-+21 





1 












1 







1 










1 







1 







1 










1 





1 


1 1 






1 


1 


1 




1 


1 1 




1 


1 


1 




1 


1 




1 





1 




1 





1 


1 









1 





1 




1 


1 




1 


1 


1 







1 1 




1 





1 




1 


1 


1 




M187- 


♦69 




M189- 


-♦97 




M191^5 




M229^114 




M231^ 


28 




M238- 


-^84 




M239- 


-+17 





1 









1 










1 







1 


1 







1 1 







1 


1 







1 


1 


1 


1 1 






1 


1 


1 




1 


1 1 




1 










1 







1 





1 




1 





1 





1 1 






1 





1 




1 


1 1 




1 





1 




1 


1 1 




1 


1 







1 


1 


1 




M245^98 




M247- 


-^12 




M254^68 


M255- 


^1(D+D) 




M325^186 




M327^58 




M335^26 





1 1 









1 


1 







1 1 







1 


1 




1 


1 




1 





1 




1 





1 


1 


1 






1 


1 







1 


1 1 




1 


1 


1 





























1 


1 


1 






1 


1 


1 




1 


1 




1 


1 


1 




1 


1 




1 


1 


1 




1 


1 


1 




M341^ 


170 




M343 


-^42 




M351^10 


M365- 


^56(1L:/) 




M367^ 


18 




M381- 


-^40 




M383 


-^2 


1 


1 






1 





1 




1 


1 




1 





1 




1 


1 




1 





1 




1 





1 





1 









1 










1 1 




1 





1 




1 


1 




1 


1 


1 




1 


1 


1 


1 


1 






1 


1 


1 




1 


1 1 




1 





1 




1 


1 1 




1 





1 




1 


1 


1 



M495^16 



1 


1 


1 


1 





1 


1 


1 


1 



Tabic 2: The penetrance tables of A'^i — A^2 ^ 1 =50 two-locus models. These models are equivalent to 
the models in Tabic 1 by the 0^1 permutation plus possibly other permutations between two loci and 
between two alleles. The most familiar models, including the two single-locus models - the dominant (D) 
and the negative interference (/) model, are marked. 
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Mil requires at least three disease alleles, regardless of which locus the disease alleles 
are from, to be affected. M95, which is equivalent to Mil, requires at least two 
disease alleles to be affected. 

6. An exclusive OR model (XOR) 

M78 is almost the R+R model except for the two-locus genotype AA-BB. This model 



was used to model the genetics of handedness . In fact, M78 is one of the "exclusive 
OR" models to be discussed below. 

There are also the following classification schemes 

• Single-locus models (IL): 

M7 is a single-locus recessive model (it is also equivalent to a single-locus dominant 
model M63, by ^ 1 permutation in the penetrance table, followed by a permutation 
between alleles a and A). M56 is a single- locus "interference" (the term used by 
Johnson is "metabolic interference" 10), or "maximum heterozygosity model". As 



discussed in details by Johnson in this hypothetical model, neither allele a nor 
A is really abnormal; only when the gene products interact, can there be harmful 
effects. M365 is equivalent to M56 by the 0^1 permutation (plus a permutation 
between two loci), which can be called a "negative interference model" or a "maximum 
homozygosity model". Models similar to M56 and M365, which are neither dominant 
nor recessive, will be discussed more below. M7,M63,M56,M365 are labeled as R,D,I, 
1. 

We can classify two-locus models which are one-mutation away from single-locus 
models as almost single-locus models. The modifying-effect model M15 is actually 
an almost single-locus model. Others include M23, M57, M58 ( — > 1 mutation in 
the penetrance table), M3, M5, M59, and M61 (1^0 mutation in the penetrance 
table). 

Logical AND (multiplicative) models: 

The logical AND operation on two binary variables is defined as: AND = 0, 
AND 1=0, 1 AND = 0, 1 AND 1 =1. Imagine that the penetrance table receives a 
contribution from both loci, {gli\ and {g2j} = 1, 2, 3), and the penetrance value 
can be represented as a product of the two contributions [|6^ : 

= gU AND g2j, 

This class of model includes Ml(RR), M2(RI), M3(RD), M5(RJ), M16(II), M18(DI), 
M27(DD), M40(l7), M45(DJ), and M325 (77), where i?,D,/,7 are dominant, reces- 
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sive, interference, and negative interference single-locus models. M325 is equivalent 
to M186 by the permutation in the affection status. Although M7 and M56 are also 
logical AND models, they are actually trivial single-locus models. One can see that 
for M45, for example, when the second and third columns in the penetrance table 
are switched, all non-zero elements form a rectangular block. It is true for any mul- 
tiplicative model that such a rectangular block can be formed by switching columns 
and/or rows. 

The special interest of multiplicative models lies in the fact that the probability of 
the value of identity-by-descent at one locus is independent of the other locus pi| . 
In other words, if one uses the joint identity-by- descent between affected sibpairs to 
study a possible interaction between two locations, such an interaction cannot be 
detected. More on the calculation of the probabihty of identity-by-descent values will 
be discussed below. 

• Logical OR (heterogeneity) models: 

The logical OR operation on two binary variables is defined as: OR = 0, OR 
1 = 1, 1 OR = 1, 1 OR 1 = 1. The 1 permutation in the penetrance table 
will transform a logical AND model to a logical OR model, or a heterogeneity 
model. Note that for fully-penetrant models, we cannot have an exact, but only 
approximate, additive models in the original sense, since 1+1=2 is larger than 
what is allowed by a penetrance. 

• Logical XOR models: 

The logical XOR (exclusive OR) operation on two binary variables is defined as: 
XOR =0, XOR 1=1, 1 XOR 0=1, 1 XOR 1=0. The last equation makes XOR an 
extremely non-linear operation. Because of this property, XOR is a favorite function 
to illustrate the advantage of artificial neural networks over linear discrimination and 
linear regression (e.g. [^). Logical XOR two-locus models include M78 (as discussed 
earlier), M113, and M170. 

• Conditional dominant (recessive) models: 

These are models where the first (or the second) locus behaves like a dominant (or 
recessive) model if the second (or the first) locus takes a certain genotype. For 
example, the first locus in Mil behaves as a recessive model when the genotype at 
the second locus is bB, but as a dominant model when the genotype at the second 
locus is BB. Models similar to Mil include: Ml(RR), M2, M3(DR), M5, M13, 
M15(Mod), M18, M19, M23, and M45. 
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Interference models: neither dominant nor recessive: 

We can extend the single-locus "neither dominant nor recessive" models M56 and 
M365 to two-locus models. In positive interferences, two otherwise normal proteins 
produced at two loci interact to lead to the disease. In negative interferences, two com- 
plementary proteins lead to a functional product and an unaffected person, whereas 
the lack of either complementary component leads to affection. These following mod- 
els illustrate the situation: M68, M186, and M170. 

In M68, the only two- locus genotypes that lead to the disease are aa-BB and bb-AA. 
Suppose an abnormal effect is caused by an interaction between the protein product 
generated from allele a and that from B, or between the protein products from b and 
A. Then only the above two two-locus genotypes lead to the maximum abnormal 



effect. This model was studied in ||65|| . 

For M325, which is equivalent to M186 by the 1 permutation in the penetrance 
table, four two-locus genotypes lead to the disease: aa-bb, aa-BB, AA-bb, AA-BB. 
This is a situation where maximum doses of the protein produced at both loci lead to 
the disease. From this perspective, M325 is a "maximum homozygosity" model (and 
M186 a "maximum heterozygosity" model). 

For M170, four two-locus genotypes lead to the disease: aa-bB, aA-bb, aA-BB, AA- 
bB. The difference between M170 and M186 is that the double-heterozygosity geno- 
type aA-bB does not lead to the disease, whereas all other heterozygous genotypes 
lead to the disease. One might consider that there is another between-locus interfer- 
ence besides the within- locus interference, and the two interferences cancel out. 

In Drosophila genetics, the phenomenon of metabolic interference is called "negative 
complementation" ^ . For example, the Notch gene has two types, "enhancers" 
and "suppressors" . The homozygotes for both types are viable, whereas the heterozy- 
gotes are lethal. 

The phenomenon of "maternal-fetal incompatibility" is reminiscent of, but not 
identical to, the interference we discuss here. This incompatibility is between the red 
blood cells in the mother and in the fetus, due to the inheritance of two different 
alleles from the mother and the father. This occurs only if the fetus' genotype is 
heterozygous. 

More modifying-effect models: 

Just as M15 is a modified version of the single-locus recessive model, any model whose 
penetrance table is one mutation away from a classified model has a modifying-effect 
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on the latter. For example, changing the penetrance value from 1 to in M41 at the 
two-locus genotype aA-bb makes it a single-locus dominant model. Other modifying- 
effect models are listed in Table 3. 

• Missing lethal genotype models: 

We consider the following situation: a genetic disease requires a minimum number of 
disease alleles from either/both locus/loci (i.e. alleles A and B), which lead to models 
similar to the threshold model (Mil or its equivalent model M95). Nevertheless, if the 
disease is lethal, all individuals carrying a large number of disease alleles disappear 
from the population. Consequently, it is impossible to have the two-locus genotype 
with the maximum number of disease alleles (e.g. AA-BB, AA-bB, aA-BB). Although 
all possible two-locus genotypes are specified in the penetrance table, some genotypes 
never appear in the population. Effectively, we may replace the penetrances at these 
genotypes by "not available" +'s or O's. 

For example, in the penetrance table below, the AA-BB genotype is missing from the 
population, thus its penetrance is replaced by a "-I-": 





bb 


bB 


BB 


aa 











aA 








1 


AA 





1 


+ 



(3) 



Since we will never have a chance to use the penetrance represented by -|-, it might 
be replaced by a 0, and become model MIO. The following models also belong to this 
class: M2, M12, M14, M18, M26, M28, M30, M78, M84, M86, M94, M124 (equivalent 
to M108), M126 (equivalent to M70), M254 (equivalent to M68) (the +'s appear in 
the lower-right corner), M3, M19 (the -|-'s appear in the upper-right corner). A model 
similar to M84 was discussed in p6| . 

The discussion presented here illustrates a general principle: even if two two-locus 
models may differ in their penetrance table, they can be effectively identical if the 
differing element appears with a very small probability. 

• Highly symmetric models: 

During the discussion of Polya's theorem, eight permutations were listed including 
the identity operation and seven other permutations. Whether a model is invariant 
or not under the seven permutations provides a measure of the degree of symmetry 
of the model. For example, M40 is invariant under three permutations: exchange of 
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alleles a and A, exchange of alleles b and B, exchange of both a. A. and b. B. Other 
models which are invariant under a large number of permutations (indicated by the 
number in the parentheses) include: M16 (7), M40 (3), M68 (3), M84 (3), M170 (7), 
M186 (7). M56 is excluded because it is a single-locus model. 

Models that are symmetric with respect to permutation of two loci need only one 
single-locus model to approximate both loci. Models that are symmetric with respect 
to permutation of two alleles might be more relevant to common diseases. 

Admittedly, there are "exotic" models which have yet to be classified. Although one 
can relax the definitions of modifying-effect and interference models to incorporate them, 
they are less likely to be useful in modeling the gene-gene interaction in real situations. 
Table 3 summarizes what we have discussed in this section. 

4 Marginal penetrance tables 

One important question we ask is how a two-locus model differs from a single-locus model. 
This question has practical implications in linkage analyses because almost all current 
analyses are carried out by focusing on one susceptibility gene. We can use the marginal 
penetrance table on each one of the two loci to represent the effective single-locus model as 
the effects of other interacting genes are averaged out. The marginal penetrance table on 
the first locus is: f^^^^ = J2j Pj fij where {-P/} are the genotype frequencies at the second 
locus, and that on the second locus is fj^^'^ — J2iPifij, where {-P/} are the genotype 
frequencies at the first locus. 

Take the modifying-effect model M15, for example. If pi and p2 are disease allele 
frequencies at the two loci (^i = 1 — pi, ^2 = 1 — P2, and Hardy- Weinberg equilibrium is 
assumed), the corresponding genotype frequencies are: 

bb{qi) bB{2p2q2) BB{pl) 
aa{ql) 
aA{2piqi) 1 ^ ' 

AA{p\) 1 1 1 

The three marginal penetrances at the first locus are (0,p|,l). As expected, it is very 
similar to the recessive model except for a modifying effect on the heterozygote. Similarly, 
the three marginal penetrances at the second locus are (Pi,Pi,Pi + 2pigfi), which are almost 
zero when pi is small. If linkage analysis for markers near both disease genes is carried 
out, the marker near the first gene will provide a linkage signal under the recessive model 
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model 


classifications 


model 


classifications 


Ml 


RR,C,AND,S'i,,[3,68] (M255 D+D,OR) 


M43 


[11] 


M2 


L,C,AND,5'a, [3] 


M45 


CjAND,^^ 


M3 


L, RD, C, AND, [1,7,11] (M127 D+R,OR) 


M56 


1L:I, Sa,aa (M365 1L:T) 


M5 


C, AND, Sa, [1,7] 


M57 


[56] 


M7 


1L:R, Sa, [3] (M63 1L:D) 


M58 


Sa, [56,186] 


MIO 


L, Sl, [11] 


M59 


[27] (M71 [7]) 


Mil 


T, C, Sl, [3,27] 


M61 


Sa (M105 ^ [7]) 


M12 


L,[l] 


M68 


I, Sl,aa, [1] (M254 -> L) 


M13 


C, [3] 


M69 


Sl, [68] (M187 [186]) 


M14 


L, [3] 


M70 


[3,68] (M126 L ) 


M15 


C, [7,11] (M31 ^ [27]) 


M78 


L, XOR,Sl (M118 [27]) 


M16 


I, AND, Sl,a,aa 


M84 


L, Sl,aa, [68] 


M17 


Sl,[1,IQ] 


M85 


Sl (M171 [170]) 


M18 


L, C, Sa, and, [16,56] 


M86 


L 


M19 


L, C,[3,27] 


M94 


L, Sl (M102 [11]) 


M21 


Sa 


M97 


Sa 


M23 


C, Sa,[7] 


M98 


Sl 


M26 


Sl, [27] 


M99 




M27 


DD,C,AND,S'l,[11] (M79 ^ R+R,OR) 


MlOl 




M28 


L 


Ml 06 




M29 




M108 


Saa (M124 ^ L) 


M30 


L 


M113 


XOR, Sa 


M40 


AND, Sa,aa, [56] 


M114 


Sl 


M41 


[3] 


M170 


1,X0R,Sl,a,aa,[18Q] 


M42 


Sa, [170] 


M186 


I,OR,5i,,A,AA,[170] (M325 ^ AND) 



Table 3: IL: single-locus models (D: dominant, R: recessive, I and /: interference); RR: jointly- 
rccessive-recessive model; DD: jointly-dominant-dominant model; RD: jointly-recessive-dominant model; 
T: threshold model; I: interference models. L: missing lethal genotype models; C: conditionally dominant 
and/or conditionally recessive; AND: logical AND models (multiplicative); OR: logical OR models (het- 
erogeneity models); XOR: logical XOR models; S: symmetric models {Sl- with respect to permutation of 
two loci; 5*^.: with respect to permutation of two alleles at one locus; Saa- with respect to permutation 
of two alleles at both loci); [ ]: modifying-effect models. For example, [11] indicates a model that modifies 
Mil by one bit in the penetrance table. 
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with a modified (reduced) penetrance; the marker near the second gene will barely provide 
any linkage signal. 

Assuming pi = p2 = 0.1, Table 4 lists the marginal penetrance at both loci for all 
N2 — 1 =50 two-locus models. Table 5 lists those for the remaining A^^i — A^2 " 1 =50 
models. Each marginal penetrance on a single locus is roughly classified as one of the 
four types: dominant (D), recessive (R), interference (I), and negative interference (J). 
Note that this classification only provides crude guidance for marginal single-locus effect. 
For example, in Table 4 the marginal penetrance table (0,0.2,0.8) is classified as recessive, 
though it is only approximately recessive with some phenocopy probability. Also note that 
for models that are equivalent to the representative models listed in Tables 3 and 4, the 
marginal penetrances need to be recalculated using the correct allele frequencies. 

Marginal penetrance tables can provide insight into linkage analyses using a single- 
locus model when the underlying disease model involves two genes. For example, for 
Ml (RR), both genes behave like a recessive locus but with a highly reduced penetrance 
(0.01 if the disease allele frequency is 0.1). A single-locus-based linkage analysis might 
detect both loci but with difficulty because of the low penetrance. M78 (an XOR model) 
provides another example. It is almost identical to M79 (R+R) in that both genes behave 
as a recessive locus, but the marginal penetrance is reduced from 1 to 0.99. The almost 
negligible effect with the exclusive OR operation at the AA-BB genotype is due to the fact 
that the population frequency of the AA-BB genotype is very small. In practice, it might 
be very difficult to distinguish M78 from M79 in a single-locus-based linkage analysis. 

It is important to know that Tables 4 and 5 are derived with a particular disease allele 
frequency {pi = p2 =0.1). When the disease allele frequency is the same as the normal 
allele frequency (pi = p2 =0.5), the nature of the marginal single-locus model could be 
completely different. For example, the marginal effect of both loci in M84 is between 
recessive and dominant when pi = p2 =0.1. When pi = p2 =0.5, the marginal penetrance 
becomes (0.25, 0.5, 0.25) at both loci, similar to an interference model. If the penetrance 
/22 is 0.5 instead of 1, the marginal penetrance is (0.25,0.25,0.25) |]2B|; in other words, 
there is no marginal linkage signal at all. 

In a practical pedigree analysis, the genotype frequencies may not be taken from the 
population frequencies, but taken from the pedigrees one has |]8^, |90|, |79[. It is thus possible 
that the penetrance table is specific to each individual in the pedigree. It is another way 
of saying that the risk of developing the disease for each family member is conditional on 
the affection status of other family members, and such conditional probability may differ 
from person to person. 
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model 

# 


first locus 


second locus 


model 
# 


first locus 


second locus 


aa 


aA 


A A 

AA 


type 


bb 


bo 


TO TO 

on 


type 


aa 


^ A 

aA 


A A 

AA 


type 


bb 


bB 


TO TO 

no 


type 


Ml 








.01 










.01 




M43 





.82 


.19 


I 


.18 


.01 


.19 


I 


TV /TO 

M2 








.18 


R 





.01 







M45 





.82 


.82 


D 


.19 





.19 


I 


TV /TO 

M3 








.19 


R 





.01 


.01 




M56 





1 





I 


.18 


.18 


.18 




TV Tr 

Mo 








.82 


R 


.01 





.01 




M57 





1 


.01 


I 


.18 


.18 


.19 




M7 








1 


R 


.01 


.01 


.01 




M58 





1 


.18 


I 


.18 


.19 


.18 




MIO 





.01 


.18 


R 





.01 


.18 


R 


M59 





1 


.19 


I 


.18 


.19 


.18 




Mil 





.01 


.19 


R 





.01 


.19 


R 


Mol 





1 


.82 


D 


.19 


.18 


.19 




M12 





.01 


.81 


R 


.01 





.18 


R 


M68 


.01 





.81 


R 


.01 





.81 


R 


Mid 





.01 


.82 


R 


.01 





.19 


R 


TV it/^n 

Md9 


.01 





.82 


R 


.01 





.82 


R 


M14 





.01 


.99 


R 


.01 


.01 


.18 


R 


TV Jf7r\ 

M70 


.01 





.99 


R 


.01 


.01 


.81 


R 


M15 





.01 


1 


R 


.01 


.01 


.19 


R 


M78 


.01 


.01 


.99 


R 


.01 


.01 


.99 


R 


M16 





.18 





I 





.18 





^ 


M84 


.01 


.18 


.81 


R 


.01 


.18 


.81 


R 


M17 


u 


.la 


.01 


T 
1 





.18 


.01 




M85 


.01 


.18 


.82 


T 1 

R 


.01 


.18 


oo 
.82 


R 


iVlio 





.18 


.18 


D 





.19 







iVloD 


.01 


.18 


.99 


R 


.01 


.19 


.81 


R 


M19 





.18 


.19 


D 





.19 


.01 




M94 


.01 


.19 


.99 


R 


.01 


.19 


.99 


R 


M21 





.18 


.82 


R 


.01 


.18 


.01 




M97 


.01 


.81 


.01 


I 


.18 





.82 


R 


M23 





.18 


1 


R 


.01 


.19 


.01 




M98 


.01 


.81 


.18 


I 


.18 


.01 


.81 


R 


M26 





.19 


.18 


D 





.19 


.18 


D 


M99 


.01 


.81 


.19 


I 


.18 


.01 


.82 


R 


M27 





.19 


.19 


D 





.19 


.19 


D 


MlOl 


.01 


.81 


.82 


D 


.19 





.82 


R 


M28 





.19 


.81 


R 


.01 


.18 


.18 


D 


M106 


.01 


.82 


.18 


I 


.18 


.01 


.99 


R 


M29 





.19 


.82 


R 


.01 


.18 


.19 


D 


M108 


.01 


.82 


.81 


D 


.19 





.99 


R 


M30 





.19 


.99 


R 


.01 


.19 


.18 


D 


Ml 13 


.01 


.99 


.01 


I 


.18 


.18 


.82 


R 


M40 





.82 





I 


.18 





.18 


7 


Ml 14 


.01 


.99 


.18 


I 


.18 


.19 


.81 


R 


M41 





.82 


.01 


I 


.18 





.19 


T 


M170 


.18 


.82 


.18 


I 


.18 


.82 


.18 


I 


M42 





.82 


.18 


I 


.18 


.01 


.18 


I 


M186 


.18 


1 


.18 


I 


.18 


1 


.18 


I 



Table 4: Marginal penetrance tables at both loci for all N2 — I =50 two-locus models assuming disease 
allele frequencies pi = P2 = 0.1. D,R,I,J represents (approximately) dominant, recessive, interference, 
and negative interference. The symbol "-" represents the case where the penetrance is not very sensitive 
to changes in the genotype. 



Li, neural net 



18 



model 

# 


first locus 


second locus 


model 

# 


first locus 


second locus 


aa 


aA 


A A 

AA 


type 


bb 


bB 


BB 


type 


aa 


aA 


A A 

AA 


type 


bb 


bB 


BB 


type 


M31 





.19 


1 


R 


.01 


.19 


.19 


D 


M171 


.18 


.82 


.19 


I 


.18 


.82 


.19 




M47 





.82 


1 


D 


.19 


.01 


.19 


7 


M173 


.18 


.82 


.82 


D 


.19 


.81 


.19 




M63 





1 


1 


D 


.19 


.19 


.19 




M175 


.18 


.82 


1 


D 


.19 


.82 


.19 




M71 


.01 





1 


R 


.01 


.01 


.82 


R 


Ml 87 


.18 


1 


.19 


1 


.18 


1 


.19 




M79 


.01 


.01 


1 


R 


.01 


.01 


1 


R 


M189 


.18 


1 


.82 


D 


.19 


.99 


.19 




Ms 7 


.01 


.18 


1 


R 


.01 


.19 


.82 


R 


Miyi 


.18 


1 


1 


D 


.19 


1 


.19 




M95 


.01 


.19 


1 


R 


.01 


.19 


1 


R 


M229 


.19 


.81 


.82 


D 


.19 


.81 


.82 


D 


TV /vt r\o 

MlUi 


.01 


.81 


.99 


D 


.19 


.01 


.81 


R 


A /TOO 1 


.19 


.81 


1 


D 


.19 


.82 


.82 


D 


M103 


.01 


.81 


1 


D 


.19 


.01 


.82 


R 


TV i"o O O 

M238 


.19 


.82 


.99 


D 


.19 


.82 


.99 


D 


M105 


.01 


.82 


.01 


I 


.18 





1 


R 


M239 


.19 


.82 


1 


D 


.19 


.82 


1 


D 


M107 


.01 


.82 


.19 


1 


.18 


.01 


1 


R 


M245 


.19 


.99 


.82 


D 


.19 


.99 


.82 


D 


M109 


.01 


.82 


.82 


D 


.19 





1 


R 


M247 


.19 


.99 


1 


D 


.19 


1 


.82 


D 


iVliiU 


.01 


.82 


.99 


D 


.19 


.01 


.99 


R 


iVlZ04 


.19 


1 


.99 


D 


.19 


1 


.99 


D 


Mill 


.01 


.82 


1 


D 


.19 


.01 


1 


R 


M255 


.19 


1 


1 


D 


.19 


1 


1 


D 


M115 


.01 


.99 


.19 


I 


.18 


.19 


.82 


R 


M325 


.82 





.82 


I 


.82 





.82 


7 


M117 


.01 


.99 


.82 


D 


.19 


.18 


.82 


R 


M327 


.82 





1 


I 


.82 


.01 


.82 


7 


M118 


.01 


.99 


.99 


D 


.19 


.19 


.81 


R 


M335 


.82 


.01 


1 


I 


.82 


.01 


1 


7 


M119 


.01 


.99 


1 


D 


.19 


.19 


.82 


R 


M341 


.82 


.18 


.82 


7 


.82 


.18 


.82 


7 


M121 


.01 




.01 


I 


.18 


.18 


1 


R 


M343 


.82 


.18 


1 


7 


.82 


.19 


.82 


7 


M122 


.01 




.18 


I 


.18 


.19 


.99 


R 


M351 


.82 


.19 


1 


7 


.82 


.19 


1 


7 


M123 


.01 




.19 


I 


.18 


.19 


1 


R 


M365 


.82 


.82 


.82 




1 





1 


7 


M124 


.01 




.81 


D 


.19 


.18 


.99 


R 


M367 


.82 


.82 


1 




1 


.01 


1 


7 


M125 


.01 




.82 


D 


.19 


.18 


1 


R 


M381 


.82 


1 


.82 




1 


.18 


1 


7 


M126 


.01 




.99 


D 


.19 


.19 


.99 


R 


M383 


.82 


1 


1 




1 


.19 


1 


7 


M127 


.01 




1 


D 


.19 


.19 


1 


R 


M495 


1 


.82 


1 




1 


.82 


1 





Table 5: Similar to Table 4, but for A^i — iV2 — 1 =50 two-locus models that are equivalent to the models 
in Table 4 by switching the affection status and possibly other permutations between loci and alleles. 
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5 IBD probabilities in two-locus models 

There is a growing interest in using identity-by-descent (IBD) sharing between affected 
sibpairs or affected relative pairs to test whether a marker is hnked to a susceptibihty 
gene. The premise behind the IBD test is that affected sib pairs or affected relative 
pairs should share more IBD near the region of the disease gene than expected from a 
random segregation. IBD sharing at one location is usually determined regardless of IBD 
sharing at other chromosomal locations, in order words, a single-locus model is implicitly 
assumed. To test for possible interactions between two regions, joint IBD sharing is needed 
13,^,1611,11. 



The observed joint IBD sharing can be compared with expected IBD sharing under 
a certain model. There are at least three approaches in determining the expected joint 
IBD sharing probability at two loci between two affected sibs or affected relatives given 
a disease model. The first is to list all mating types, and count the number of each 
sharing situation among all possibilities. The second is to calculate the covariance of a 



quantitative trait between two relatives M, 44, H5l. This covariance is decomposed into 



the sum of the products of "coefficient of parentage" (or kinship coefficient) and the 



variance components. The latter includes additive and dominant variance components by a 



linear regression of the quantitative trait to the number of alleles p5|. The conversion from 
the covariance of a quantitative trait to the IBD sharing between affected relatives can be 
accomplished by Bayes' theorem. The third, and perhaps the more elegant approach, is to 
use Bayes' theorem to convert the probability of IBD sharing, given that the two relatives 
are affected, to the probability of two relatives being affected, given the IBD sharing. This 
approach was first developed by Li and Sacks in 1954 |Q p] . 



In Li-Sacks' original approach, a set of conditional probabilities, the probability that the 
second relative has a certain genotype given the first relative having a certain genotype, is 
conveniently written in three 3-by-3 matrices ("Li-Sacks matrices") or four 4-by-4 matrices 
0. These approaches were modified in |^ by using two 2-by-2 matrices, which are the 



conditional probabilities that the second relative has a certain allele derived from one 
parent, given that the first relative has a certain allele derived from the same parent. In 
this formulation, the probability that the two affected sibs share klm maternal alleles IBD 
and kip paternal alleles IBD at the first locus, and k2m maternal alleles IBD and k2p 
paternal alleles IBD at the second locus is 

numerator N 



P{klm, kip, k2m, A;2p|both sibs affected) 



denominator D 
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with 

N = ^ fjlrnjlpj2r„j2p ■ til^jl^{klra)tiipjip{klp)ti2^j2^{k2rn)ti2pj2p{k'2p) 

■/ii„iipi2„i2pPii™Piij,Pi2„Pi2j, ■ p(fcl™)p(fclp)p(A;2„)p(A;2p) 
D = (sum of N over A;lm, ^Ip, k2m, k2p) (5) 

where 

• ilm is the index for the maternally derived allele (the paternally derived allele uses 
the label p), in the first sib (second sib uses the label j), at the first locus (second 
locus uses the label 2) 

• fiimiipi2mi2p and fji^jipj2mj2p are the penetrance tables of the two-locus model. Al- 
though it has 4 indices, it can be easily obtained from the 3-by-3 penetrance table as 
in Eq.|I[ 

• Piim^Piip^Pi2m^Pi2p are the allele frequencies, which take the value of either pi or 
qi = 1 - pi. 

• p{klm) , p{klp) , p{k2m) , p{k2p) are the prior probabilities of sharing allele IBD at four 
places (maternally and paternally derived, first and second locus), which are 1/2's 
for sibpairs. 

• tiimjimiklm),tii^jipiklp),ti2^j2,Ak'2'm),'ti2pj2p{k2p) are the revised 2-by-2 Li-Sacks ma- 
trices given by: 

Despite the complicated indexing, the revised Li-Sacks approach is easier to implement 
in a computer code, and easier to generalize to other situations, such as unilineal relative 
pairs, multiple alleles, unaffected-unaffected and unaffected-affected pairs, the probability 



of identity- by-state, two markers instead of two disease genes, etc. |^^. More details will 
be discussed elsewhere [Li, in preparation]. 

There are two types of joint IBD measurements currently in use: the first is the addition 
of maternal and paternal IBDs, which take the values of 0,1,2: 

Pgeno{kl, k2) = P (^klm, kip, k2m, k2p) . (7) 
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The genotypic IBD's, {Pgeno{kl, k2)}, form a 3-by-3 matrix. The second measurement 
focuses on maternal (or equivalently, paternal) IBD only: 

Palle{klrnj k2m) = P[klm, kip, k2ra, k2p) . (8) 

The symmetry between the maternally-derived and paternally-derived alleles implies that 
P{klp, k2p) = P{klm,k2m)- The allelic IBD's, {Paiie{klm, k2m)}, form a 2-by-2 matrix, 
which will be the joint IBD measurement we use. For example, for M15 at = p2 = 0.1, 
the joint allelic IBD is: 

k2m = k2m = 1 marginal klm 
kl^ = 0.050549 0.072689 0.123238 
klm = 1 0.413962 0.462800 0.876762 
marginal fc2„ 0.464511 0.535489 1 

The marginal probabilities of IBD sharing in Eq.^ confirms our intuition that there is a 
strong preference for the IBD sharing on the first locus to be 1 (probability of sharing 
0.876762 versus non-sharing 0.123238), whereas the deviation from 0.5 at the second locus 
is very small (0.535489 versus 0.464511). 



6 Correlation between IBD sharings at two loci 

For probabilities of joint IBD sharings at two loci as exemplified by Eq.|^, we ask the 
following question: Can the joint probability be derived from the two marginal IBD sharing 
probabilities at the two separated loci? This question is motivated by the suggestion 
in |10| that one might first detect marginal effects by single-locus linkage analysis, 
then detect interaction later using the correlation analysis. Such a correlation between 
two marginals exists only if the joint probability is not equal to the product of the two 
marginals. Statistical correlations can be measured in different ways, one of them being 



the mutual information, defined as[^^ |4 



M= E ^(^l™'^2jlog,-^^i^^^^ (10) 

where P{klm, ■) and P(-, k2m) are the two marginal IBD sharing probabilities at two loci. 
Mutual information has certain meaning in information theory, and is intrinsically related 
to the concept of entropy. Two is chosen as the base of the logarithm so that it is measured 
by the unit of "bit" , though base e and base 10 can also be used. 
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We calculate the mutual information for the 2-by-2 joint probabilities of allelic IBD 
sharing at two loci for all 50 two-locus models, at 3 different allele frequency values: 
Pi = P2 =0.001, 0.01, and 0.1. Also shown is an asymmetric situation when pi = 0.1 and 
P2 = 0.01. The result is summarized in Table 6 (and Table 7 for the other 50 models). 
Only one significance digit is kept in Tables 6 and 7. 



Table 6 confirms the conclusion in |]39[ that for multiplicative models, the IBD sharing 
probability at one locus can be calculated as if there is no interaction with another locus: 
the correlation as measured by mutual information is for all these models. 

It should be of interest to examine which two-locus models exhibit the smallest correla- 
tion, and which the largest. Besides the zero correlation for multiplicative and single-locus 
models, all modifying-effect models as altered from a single-locus model or a multiplicative 
model should exhibit small correlations. Indeed, in Table 6, we see that at Pi = P2 = 0.001, 
M19, M26, M41, M57, M58, M59, M61 all exhibit close-to-zero correlations. 

From Tables 6 and 7, it seems that missing lethal genotype models tend to have larger 
correlation values, although these values are derived from a limited choice of parameter 
settings. To some extent, this observation is not surprising. Missing lethal genotype 
models are typically "non-linear" in the sense that as the sum of the total number of 
disease alleles is increased, the change in phenotype is not monotonic (it can first change 
from unaffected to affected, then from affected to unaffected). For these models, using the 
joint IBD sharing probability to detect linkage should have the greatest increase of power 
over methods using marginal probability of IBD sharing. 

Occasionally, not only would we like to know the "strength" or "magnitude" of the 
correlation between the marginal IBD sharing probabilities at two loci, but also the sign 



of the correlation. For example, in |£0|, |T0[, whether the statistical correlation between 
two linkage signals obtained at two loci is positive or negative provides an indication 
as to whether the two loci are "interacting" or simply heterogeneous. We provide this 
piece of information for all two-locus models in Tables 6 and 7. A "(P)" indicates that 
P{klm = l,k2m = 1) is larger than the expected value from no correlation P{klm = 
1) ■ P{k2m = 1); similarly, an "(N)" indicates that the joint probability is smaller than the 
product of two marginals. As expected, all heterogeneity models (M79,M127,M255) have 
negative correlations. 

Note that we measure the correlation by a probability-based quantity rather than a 
statistics-based one. This is because we start with a theoretical model, i.e. a two-locus 
model, and investigate the consequence of the model. On the other hand, if we start with a 
sample of size N and the count of joint IBD status ij is Nij (J2ij ^ij = N), we can use any 
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Table 6: Values of mutual information (with one significance digit) between the two marginal probabilities 
of IBD sharing for all — 1 =50 two-locus models. The allele frequencies are chosen at four different 
values: pi = p2 =0.001, 0.01, 0.1; pi = 0.1 and p2 = 0.01. Values lower than 10~^^ are converted to 0. 
"4e-5" means to 4 x 10~^, etc. Multiplicative models are marked by *. 
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e-6(P) 


2e-8(P) 



Table 7: Similar to Table 6 but for the Ni — N2 — 1 =50 models that are equivalent to the models in 
Table 6 by switching affection status. 
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one of statistics to test the significance of the correlation; for example, the likelihood-ratio 
statistic, 

and the Pearson chi-square statistic, 

where Ni, = ^ij and = Y^i ^ij are the two marginal counts. It can be shown 
(see Appendix 2) that and are approximately equal. Under the no-correlation null 
hypothesis, both and approximately follow the distribution with 1 degree of 
freedom. The larger the G^ and X^, the more likely that the null hypothesis is wrong. 

It is important to note that if the null hypothesis is indeed incorrect, both G^ and 
increase with the sample size N. Consequently, G^ and X^ do not measure the strength 
of the correlation, but the evidence that no-correlation hypothesis is wrong. On the other 



hand, the normalized quantities such as yG'^/N and yX^/X ("phi coefficient", page 741 
of [Q. or Cramer's V, page 631 of |T^) do measure the correlation strength. Compared 
with the mutual information defined in Eq.[iy, we see that G'^/N ^ 21og(2)M. 



7 Discussions 



We present a complete enumeration and an attempt at classification of 512 two-locus 
two-allele fuUy-penetrant disease models. Excluding zero-locus and single-locus models, 
the minimum set of non-redundant two-locus models is 48, and with the two single-locus 
models included, 50. Even though the permutation of affection status does not change the 
"nature" of the interaction between two genes, for many practical applications, it is helpful 
to keep 50 other models which are equivalent to the first 50 models by this permutation 
in the penetrance table (plus possibly other permutations between alleles and loci). For 
example, a logical OR model (heterogeneity model) is equivalent to a logical AND model 
(multiplicative model). Nevertheless, the special property for a multiplicative model, that 
the joint IBD sharing probability is equal to the product of two marginal IBD probabilities, 
does not hold for a heterogeneity model. Even with our total 100 non-redundant models, 
the permutations between alleles or loci require a corresponding change of allele frequencies 
in some calculations. 

One of the main purposes of this paper is to point out that besides 6 two-locus disease 
models typically used in linkage analysis assuming two interacting genes, there are many 
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other types of gene-gene interactions. On one hand, we admit that many of the two-locus 
models may not describe a real interaction between two gene products in a genetic disease; 
on the other hand, it is fairly straightforward to construct a biochemical system based on 
a two-locus model. A prototypical biochemical system consists of proteins formed by 
one peptide, dimer proteins formed by two complementary peptides, and dimer proteins 
formed by two identical peptides. By specifying the functional and non-functional proteins 
as well as the level of protein concentration required by a normal phenotype, it is possible 
to materialize any two-locus models. 

The marginal penetrance table we calculated in this paper is relevant to linkage analysis 
using only single-locus models. There have been discussions of whether single-locus models 
are sufficient to detect a linkage signal even if the underlying disease model may involve 
gene-gene interaction |3|, ^, §g, ^, 0, |7S|, |T3|, |g, 0. Part of the answer can 
be predicted by the marginal penetrance table: if the marginal penetrance table is clearly 
dominant or recessive, it is possible that a single-locus model is able to detect linkage; 
otherwise, two-locus models should offer more power. Although it was mentioned that the 
gain of the logarithm of likelihood ratio (same as log-of-odd, or LOD scores) by using two- 
locus models over those by single-locus models may be at most 17% |]79[, after removing 
the logarithm, the increase of the likelihood ratio can be much larger. For example, if the 
LOD score equals to 2, or the likelihood ratio is equal to 100, an increase in LOD of 17% 
is equivalent to an increase in likelihood ratio of 118%! What is considered as "more" 
powerful versus "slightly more" powerful is not specified. 

As a compromise between detecting linkage signals using single-locus models and using 
two-locus models, it is suggested that a pairwise correlation between linkage signals ob- 



tained by single-locus models can be used to detect linkage for interacting genes |T0|. 
A similar idea for detecting higher-order correlations among linkage signals from different 
locations using artificial neural networks is discussed in [Q. Our result on the sign and 
strength of correlation between two marginal IBD sharing probabilities (Tables 6 and 7) 
is directly relevant to this approach. We observed that models modified from the multi- 
plicative and single-locus models exhibit a very weak correlation, whereas missing lethal 
genotype models or "non-linear" models exhibit the strongest correlation. Since many 
two-locus models share similar correlation values, of sign and magnitude, we may not be 
able to distinguish them using this approach. 

There are many topics on two-locus disease models that are not discussed here. Some 



classification schemes discussed in [56| are not included (e.g. models that are conditionally 
dominant or recessive with respect to two loci), as well as the idea of genotype-induced 
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representation of joint IBD distributions (Reich, unpublished results), and the idea of 
"phase transition" in the two- locus model space (Li, unpublished results). The extension 
from fully-penetrant models to reduced-penetrant models as well as models for quanti- 
tative traits is very important since many complex diseases are not dichotomous. Many 
calculations presented in this paper are implemented in a computer program: u2 for "util- 
ity program for two-locus models". More information on this program can be found at 
the web page \http:/ /linkage. rockefeUer.edu/soft/u2. 



Appendices 



1. A formal derivation of the value of N2 by de Bruijn's theorem 

Let's consider two permutations applied on the phenotypes: the identity operation and 
the exchange permutation. The cycle index of this permutation group on the phenotype 
is: 



By de Bruijn's generalization of Polya's theorem (theorem 5.4 in ||T3[), when the per- 
mutation group on phenotypes is considered, the number of equivalence two-locus models 
can be obtained by the following procedure: replacing xi in C^eno by the partial deriva- 
tive d/dxi, X2 by d/dx2, etc., and applying the partial derivative to Cpheno while re- 
placing Xi with e^^'^'^^^~^"'\ X2 with ^'^(^2+^4,+-)^ then evaluating the expression at 
Xi = X2 = ■ ■ ■ = : 



No 



1 
8 
1 
2 

51. 



^9 ^3 ^3 

h 4 h 

dxi dxi 8x2 



d ^ d 

+ 2 



d 



dxi 8x2 



dxi 8x4 



^2{xi+X2+X3+X4,) _|_ ^2{x2+X4) 



Since the permutation group on the phenotype considered here is particularly simple, 
N2 is simply Ni divided by 2. 
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2. Approximate equivalence between and 

If we write Jij = Nij/N, Sij = Ni^N^j/N"^, and assume the difference between the two is 
small: Aj^ = Jij — Sij, the following approximation by a Taylor expansion, 

shows that and are approximately equal [0]. 
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